MCN: Modulated Convolutional Network

41

creased. In particular, to alleviate the disturbance caused by the binarized process, a center

loss is designed to incorporate the intraclass compactness with the quantization loss and

filter loss. The red arrows are used to show the back-propagation process. By considering

filter loss, center loss, and softmax loss in a unified framework, we achieve much better

performance than state-of-the-art binarized models. Most importantly, our MCNs model

is highly compressed and performs similarly to the well-known full-precision Resnets and

WideResnets.

As shown in Fig. 3.1, M-Filters and weights can be jointly optimized end-to-end, resulting

in a compact and portable learning architecture. Due to the low model complexity, such an

architecture is less prone to overfitting and is suitable for resource-constrained environments.

Specifically, our MCNs reduce the required storage space of a full-precision model by a

factor of 32 while achieving the best performance compared to the existing binarized filter-

based CNNs, even approximating full-precision filters. In addition, the number of model

parameters to be optimized is significantly reduced, thus generating a computationally

efficient CNNs model.

3.4.1

Forward Propagation with Modulation

We first elaborate on the MCNs as vanilla BNNs with only binarized weight. We design

specific convolutional filters used in our MCNs. We deploy the 3D filter across all layers of

size K × W × W (one filter), which has K planes, and each of the planes is a W × W-sized

2D filter. To use such filters, we extend the input channels of the network, e.g., from RGB

to RRRR or (RGB+X) with K = 4 and X denotes any channel. Note that we only use

one channel of gray-level images. Doing so allows us to implement our MCNs in existing

deep-learning platforms quickly. After this extension, we directly deploy our filters in the

convolution process, whose details concerning the MCNs convolution are illustrated in Fig.

3.2(b).

To reconstruct unbinarized filters, we introduce a modulated process based on M-Filters

and binarized filters. An M-Filter is a matrix that serves as the weight of binarized filters,

which is also the size of K × W × W. Let Mj be the j-th plane of an M-Filter. We define

the operationfor a given layer as follows:

ˆCiM =

K



j

ˆCiM

j,

(3.12)

where M

j = (Mj, ..., Mj) is a 3D matrix built based on K copies of the 2D matrix Mj with

j = 1, ..., K.is the element-wise multiplication operator, also termed the Schur product

operation. In Eq. 3.12, M is a learned weight matrix used to reconstruct convolutional filters

Ci based on ˆCi and the operation. And it leads to the filter loss in Eq. 3.18. An example

of filter modulation is shown in Fig. 3.2(a). In addition, the operationresults in a new

matrix (named reconstructed filter), i.e., ˆCiM

j, which is elaborated in the following. We

define:

Qij = ˆCiM

j,

(3.13)

Qi = {Qi1, ..., QiK}.

(3.14)

In testing, Qi is not predefined but is calculated based on Eq. 3.13. An example is shown

in Fig. 3.2(a). Qi is introduced to approximate the unbinarized filters wi to alleviate the

information loss problem caused by the binarized process. In addition, we further require

M0 to simplify the reconstructed process.